-
Notifications
You must be signed in to change notification settings - Fork 42
Triton inference server support #26
base: main
Are you sure you want to change the base?
Conversation
| from googleapiclient.discovery import build | ||
|
|
||
|
|
||
| class Tool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs an update function that can be called, retrieval and calendar will need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should the update do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass in the data, the update will use either the text, url, or etc to setup for the generation
| selected_start_tokens = probs[:, insert_api_at, start_tokens].argmax().item() | ||
|
|
||
| for i in range(m): | ||
| _, api_calls = await model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add in the tool here before sending, e.g. [Calendar(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, shouldn't that already be in the annotation prompt?
| iter_data = islice(iter_data, max_datapoints) | ||
|
|
||
| async def sample_and_filter_api_calls(tool, text, top_k, n_gen): | ||
| async for tool_use in sample_api_calls( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should chunk this, otherwise we're cutting off everything past 512? tokens
Needs a tool update function for current data
|
Cool with putting this into a dev branch and working on the stuff I found independently if that sounds good, or we can update this |
This PR adds support for triton inference server. Might be worth @dmahan93 or @conceptofmind trying it out to verify. Also maybe worth adding a non-triton
infer_modelfunction that just loads the.ptfile and runs it in process.